Enhanced Named Entity Transliteration Model Using Machine Learning Algorithm
نویسندگان
چکیده
Named entities are important elements in Cross Language Information Retrieval (CLIR) system for locating relevant documents. For translating named entities, compression word format model was designed in the literature, particularly on person names. This model only maps the given source name with the target name stored in the database but it does not generates the target name. The same model is applicable only for the source names which have equivalents in the database. To overcome these problems, a new model called mapping and generation with dynamic learning is proposed. This model transliterates the named entities, particularly person names, places and organization names. The process in the proposed model is to compress the given source name and compare with the target database names. The model accurately maps the source name with the target database names, if the source name has a right equivalent target name and retrieves the equivalent and relevant target names. Otherwise it transliterates the source name and retrieves the relevant target names from the web. These new words need to be updated in the database automatically. To automate the updation process, a dynamic learning algorithm is designed and deployed. The accuracy of the proposed system is improved using learning algorithm when compared with manual updation process.
منابع مشابه
Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora
Named Entity recognition (NER) is an important part of many natural language processing tasks. Current approaches often employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an (almost) unsupervised learning algorithm for automatic discovery of Named Entities (NEs) in a resource free language, given a bilingual corpora i...
متن کاملNamed Entity Transliteration and Discovery in Multilingual Corpora
Named Entity recognition (NER) is an important part of many natural language processing tasks. Current approaches often employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper1 presents an (almost) unsupervised learning algorithm for automatic discovery of Named Entities (NEs) in a resource free language, given a bilingual corpora ...
متن کاملOptimizing Transliteration for Hindi/Marathi to English Using only Two Weights
Machine transliteration has received significant research attention in last two decades. It is observed that Hindi to English and Marathi to English named entity machine transliteration is comparably less studied. Currently, research work in this domain is carried out by using grapheme based statistical approaches. But, to achieve better accuracy for the transliteration, an adequate bilingual t...
متن کاملA Hybrid Approach of English- Hindi Named-entity Transliteration
In recent years, machine transliteration has gained a center of attention for research. Both machine translation and transliteration are important for e-governance and web based online multilingual applications. As machine translation translate source language to target language which results in wrong translation for named entities. Named entities are required to be translated with preserving t...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Adv. Comp. Techn.
دوره 2 شماره
صفحات -
تاریخ انتشار 2010